Goto

Collaborating Authors

 planning phase


OnReward-FreeReinforcementLearningwith LinearFunctionApproximation

Neural Information Processing Systems

During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to computeanear-optimalpolicy.


OnReward-FreeReinforcementLearningwith LinearFunctionApproximation

Neural Information Processing Systems

During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to computeanear-optimalpolicy.



Macroscopic EEG Reveals Discriminative Low-Frequency Oscillations in Plan-to-Grasp Visuomotor Tasks

Cetera, Anna, Ghafoori, Sima, Rabiee, Ali, Farhadi, Mohammad Hassan, Shahriari, Yalda, Abiri, Reza

arXiv.org Artificial Intelligence

Abstract--Objective: The vision-based grasping brain network integrates visual perception with cognitive and motor processes for visuomotor tasks. While invasive recordings have successfully decoded localized neural activity related to grasp type planning and execution, macroscopic neural activation patterns captured by noninvasive electroencephalography (EEG) remain far less understood. Methods: We introduce a novel vision-based grasping platform to investigate grasp-type-specific (precision, power, no-grasp) neural activity across large-scale brain networks using EEG neuroimaging. The platform isolates grasp-specific planning from its associated execution phases in naturalistic visuomotor tasks, where the Filter-Bank Common Spatial Pattern (FBCSP) technique was designed to extract discriminative frequency-specific features within each phase. Support vector machine (SVM) classification discriminated binary (precision vs. power, grasp vs. no-grasp) and multiclass (precision vs. power vs. no-grasp) scenarios for each phase, and were compared against traditional Movement-Related Cortical Potential (MRCP) methods. Results: Low-frequency oscillations (0.5-8 Hz) carry grasp-related information established during planning and maintained throughout execution, with consistent classification performance across both phases (75.3-77.8%) Higher-frequency activity (12-40 Hz) showed phase-dependent results with 93.3% accuracy for grasp vs. no-grasp classification but 61.2% for precision vs. power discrimination. Feature importance using SVM coefficients identified discriminative features within frontoparietal networks during planning and motor networks during execution. Conclusion: This work demonstrated the role of low-frequency oscillations in decoding grasp type during planning using noninvasive EEG. Significance: These findings provide a foundation toward scalable, intention-driven Brain-Machine-Interface (BMI) control strategies.





emphasize the technical novelty of our upper bound and lower bound as Reviewer # 1, Reviewer # 3 and Reviewer # 4 2 commented on the technical novelty of our theoretical results

Neural Information Processing Systems

We thank all the reviewers for their valuable feedback and appreciating our contributions. T echnical novelty of the upper bound. In the exploration phase, Jin et al. [2020] set reward to be To our knowledge, this idea is new in the literature. For example, for the hard instance in [Du et al. 2020], only a single state-action pair has non-zero reward Moreover, we focus on the reward-free setting while Du et al. [2020] focused on the standard RL setting. Below we address specific concerns from each reviewer.


Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery

Xu, Licong, Sarkar, Milind, Lonappan, Anto I., Zubeldia, Íñigo, Villanueva-Domingo, Pablo, Casas, Santiago, Fidler, Christian, Amancharla, Chetana, Tiwari, Ujjwal, Bayer, Adrian, Ekioui, Chadi Ait, Cranmer, Miles, Dimitrov, Adrian, Fergusson, James, Gandhi, Kahaan, Krippendorf, Sven, Laverick, Andrew, Lesgourgues, Julien, Lewis, Antony, Meier, Thomas, Sherwin, Blake, Surrao, Kristen, Villaescusa-Navarro, Francisco, Wang, Chi, Xu, Xueqing, Bolliet, Boris

arXiv.org Artificial Intelligence

We present a multi-agent system for automation of scientific research tasks, cmbagent (https://github.com/CMBAgents/cmbagent). The system is formed by about 30 Large Language Model (LLM) agents and implements a Planning & Control strategy to orchestrate the agentic workflow, with no human-in-the-loop at any point. Each agent specializes in a different task (performing retrieval on scientific papers and codebases, writing code, interpreting results, critiquing the output of other agents) and the system is able to execute code locally. We successfully apply cmbagent to carry out a PhD level cosmology task (the measurement of cosmological parameters using supernova data) and evaluate its performance on two benchmark sets, finding superior performance over state-of-the-art LLMs. The source code is available on GitHub, demonstration videos are also available, and the system is deployed on HuggingFace and will be available on the cloud.


Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning

Liu, Yongshuai, Liu, Xin

arXiv.org Artificial Intelligence

Model-based reinforcement learning (MBRL) has demonstrated superior sample efficiency compared to model-free reinforcement learning (MFRL). However, the presence of inaccurate models can introduce biases during policy learning, resulting in misleading trajectories. The challenge lies in obtaining accurate models due to limited diverse training data, particularly in regions with limited visits (uncertain regions). Existing approaches passively quantify uncertainty after sample generation, failing to actively collect uncertain samples that could enhance state coverage and improve model accuracy. Moreover, MBRL often faces difficulties in making accurate multi-step predictions, thereby impacting overall performance. To address these limitations, we propose a novel framework for uncertainty-aware policy optimization with model-based exploratory planning. In the model-based planning phase, we introduce an uncertainty-aware k-step lookahead planning approach to guide action selection at each step. This process involves a trade-off analysis between model uncertainty and value function approximation error, effectively enhancing policy performance. In the policy optimization phase, we leverage an uncertainty-driven exploratory policy to actively collect diverse training samples, resulting in improved model accuracy and overall performance of the RL agent. Our approach offers flexibility and applicability to tasks with varying state/action spaces and reward structures. We validate its effectiveness through experiments on challenging robotic manipulation tasks and Atari games, surpassing state-of-the-art methods with fewer interactions, thereby leading to significant performance improvements.